-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency graph #659
base: main
Are you sure you want to change the base?
Dependency graph #659
Conversation
Signed-off-by: Polina Binder <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #659 +/- ##
=======================================
Coverage 86.75% 86.75%
=======================================
Files 118 118
Lines 7059 7059
=======================================
Hits 6124 6124
Misses 935 935 ☔ View full report in Codecov by Sentry. |
scripts/dependency_graph.py
Outdated
return pyproject_files | ||
|
||
|
||
def parse_dependencies(pyproject_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could also parse tach.toml and possibly warn if the dependency graphs are different? The tach check actually enforces this separation during CI, so it's probably more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check in place to ensure that the tach.toml and pyproject.toml files are up-to-date and valid in terms of dependencies? For example, does PyPI installation and importing all subpackages automatically verify this?
We could implement regular checks or enforcement in the CI pipeline. If the above method isn't sufficient, we can create a script that parses the project.toml files of the main project and its subpackages, extracts the import paths used in Python scripts under the src directory of each subpackage, and verifies all imports starting with from bionemo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dorota -- I suggest we constrain scope for now to just drawing the dependency graph.
I agree it's a good idea to do what you're describing, but it would increase the scope of this substantially and at the moment there's other stuff we gotta do :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trvachov , @polinabinder1 , therefore, it would be good to ticket that in github issue and JIRA and add a warning note that this method does not ensure correctness of the dependency graph and that the proposed method in my comment or alternative tool should be added to complete this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code gets the tach.toml dependencies and checks that the code imports in the sub-packages are correct based on what is in the pyproject.toml and tach.toml files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure is pyproject.toml is the up to date source of dependency information anymore. I am not sure how it is maintained.
If no, I thin we should implement a script that parses dependency graph for subpackages from the scripts, ie parse the project.toml files of the main project and its subpackages, extract the import paths used in Python scripts under the src directory of each subpackage, and verifies all imports starting with from bionemo
.
To me this looks like a good start just need to document "how to run" script in github PR description. I don't necessary need this to do any more than in currently does ( @pstjohn @dorotat-nv , I suggest we constrain our review just to "graph drawing" rather than any sort of py file parsing + CI enforcement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this script be relocated under internal/scripts?
ccab862
to
94fd399
Compare
Signed-off-by: Polina Binder <[email protected]>
94fd399
to
ad206c4
Compare
Signed-off-by: Polina Binder <[email protected]>
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, the module was laid out nicely and you have a lot of clear, reusable functions.
I do think we need unit tests for these functions though. Copilot could likely do that pretty quickly. Ideally any public function / class / method should get a test for all its expected behavior and edge cases, but even just a basic test for each of these functions would be great
I also think we should add those resulting images to our documentation; along with a command of how to run this to regenerate them.
Unfortunately putting this in internal/scripts
means it's harder to write tests for; we don't execute pytests in that subdirectory. Maybe this could live in bionemo-fw
? Or we could add that directory to our pytest call. But I'm worried that if we don't exercise this script in CI, it will break quickly without us realizing it and we won't be able to use it in planning a version bump / release strategy.
This generates a dependency graph between the bionemo sub-packages. Additionally, this will check that the pyproject.toml files agree with what's in the source files. This will also parse the source files to make sure that dependencies are correct between the bionemo sub-packages.